libfuzzer & LLVM 初探

libfuzzer & LLVM 初探

Intro: Analysis of libfuzzer && LLVM

libfuzzer build

编译流程

环境Ubuntu16.04

1
2
3
4
5
git clone https://github.com/Dor1s/libfuzzer-workshop.git
sudo sh checkout_build_install_llvm.sh
sudo apt-get install -ymake autoconf automake libtool pkg-config zlib1g-dev
cd libfuzzer-workshop/libFuzzer
Fuzzer/build.sh

【+】libfuzzer-workship

趁着编译的时候去详细了解一下libfuzzer其中的内存监控算法

【+】AddressSanitizer

【+】AddressSanitizer 分析

【+】AddressSanitizer csdn

The run-time library replaces the malloc and free functions. The memory around malloc-ed regions (red zones) is poisoned. The free-ed memory is placed in quarantine and also poisoned. ==Every memory access in the program is transformed by the compiler in the following way:==

Before: 变量赋值

1
*address = ...;  // or: ... = *address;

After:加上检测

1
2
3
4
if (IsPoisoned(address)) {
ReportError(address, kAccessSize, kIsWrite);
}
*address = ...; // or: ... = *address;

Memory mapping and Instrumentation

  • shadwos 和 main memory
  • 编译器进行了如下插桩
    1
    2
    3
    4
    shadow_address = MemToShadow(address);
    if (ShadowIsPoisoned(shadow_address)) {
    Repozhuang'tai'yrtError(address, kAccessSize, kIsWrite);
    }

并且针对shadows的one byte进行了与main memory的状态映射

1
2
3
4
5
6
7
8
9
10
11
12
13
14
byte *shadow_address = MemToShadow(address);
byte shadow_value = *shadow_address;
if (shadow_value) {
if (SlowPathCheck(shadow_value, address, kAccessSize)) {
ReportError(address, kAccessSize, kIsWrite);
}
}

// Check the cases where we access first k bytes of the qword
// and these k bytes are unpoisoned.
bool SlowPathCheck(shadow_value, address, kAccessSize) {
last_accessed_byte = (address & 7) + kAccessSize - 1;
return (last_accessed_byte >= shadow_value);
}

针对全内存,判断poison==0 :fastpath

针对非全内存,SlowPathCheck,(last_accessed_byte:最后写入的数据大小;shadow_value:能写入的数据大小)

针对部分fastpath不能满足的非对齐oob访问,我的想法是干脆放弃fastpath转用slowpath,结果看了issue发现确实是这样,但是有一定的性能损耗,得不偿失。

研究这部分算法也是得不偿失然鹅。。

对于栈做了如下插桩:

1
2
3
4
5
6
7
8
9
10
11
12
13
void foo() {
char redzone1[32]; // 32-byte aligned
char a[8]; // 32-byte aligned
char redzone2[24];
char redzone3[32]; // 32-byte aligned
int *shadow_base = MemToShadow(redzone1);
shadow_base[0] = 0xffffffff; // poison redzone1
shadow_base[1] = 0xffffff00; // poison redzone2, unpoison 'a'
shadow_base[2] = 0xffffffff; // poison redzone3
...
shadow_base[0] = shadow_base[1] = shadow_base[2] = 0; // unpoison all
return;
}

addresssanitize 源码分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
bool AddressSanitizerModule::runOnModule(Module &M) {
C = &(M.getContext());
int LongSize = M.getDataLayout().getPointerSizeInBits();
IntptrTy = Type::getIntNTy(*C, LongSize);
TargetTriple = Triple(M.getTargetTriple());
Mapping = getShadowMapping(TargetTriple, LongSize, CompileKernel);
initializeCallbacks(M);

if (CompileKernel)
return false;

// Create a module constructor. A destructor is created lazily because not all
// platforms, and not all modules need it.
std::string VersionCheckName =
kAsanVersionCheckNamePrefix + std::to_string(GetAsanVersion(M));
std::tie(AsanCtorFunction, std::ignore) = createSanitizerCtorAndInitFunctions(
M, kAsanModuleCtorName, kAsanInitName, /*InitArgTypes=*/{},
/*InitArgs=*/{}, VersionCheckName);

bool CtorComdat = true;
bool Changed = false;
// TODO(glider): temporarily disabled globals instrumentation for KASan.
if (ClGlobals) {
IRBuilder<> IRB(AsanCtorFunction->getEntryBlock().getTerminator());
Changed |= InstrumentGlobals(IRB, M, &CtorComdat);
}

// Put the constructor and destructor in comdat if both
// (1) global instrumentation is not TU-specific
// (2) target is ELF.
if (UseCtorComdat && TargetTriple.isOSBinFormatELF() && CtorComdat) {
AsanCtorFunction->setComdat(M.getOrInsertComdat(kAsanModuleCtorName));
appendToGlobalCtors(M, AsanCtorFunction, kAsanCtorAndDtorPriority,
AsanCtorFunction);
if (AsanDtorFunction) {
AsanDtorFunction->setComdat(M.getOrInsertComdat(kAsanModuleDtorName));
appendToGlobalDtors(M, AsanDtorFunction, kAsanCtorAndDtorPriority,
AsanDtorFunction);
}
} else {
appendToGlobalCtors(M, AsanCtorFunction, kAsanCtorAndDtorPriority);
if (AsanDtorFunction)
appendToGlobalDtors(M, AsanDtorFunction, kAsanCtorAndDtorPriority);
}

return Changed;
}

sum

  • 内存监控围绕 RZ 插桩来实现
  • 对一些内存的状态进行shadow的映射,访问的时候进行状态检测

libfuzzer貌似编译完了,我去看看

radamsa

【+】radamsa

学习libfuzzer中遇到的种种:

  • 有corpus
  • LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)的data是随机的,但size需要自己设置 max_len.
  • libfuzzer option 指北
  • 编译时用到的参数可以在 clang -help 中查看
  • -fsanitize=address: 表示使用 AddressSanitizer
  • -fsanitize-coverage=trace-pc-guard: 为 libfuzzer 提供代码覆盖率信息
  • Seed: 1608565063 说明这次的种子数据
  • -max_len is not provided, using 64 , -max_len 用于设置最大的数据长度,默认为 64
  • ASAN_OPTIONS=symbolize=1 ./first_fuzzer ./crash-id 显示栈
  • 简单来说,如果我们要 fuzz 一个程序,找到一个入口函数,然后利用LLVMFuzzerTestOneInput就可以完成基本功能,然鹅:
  • 我发现libfuzzer-interface还有几个接口类似于LLVMFuzzerCustomMutator。

练习写第一个fuzzer

代码:

测试最基本的溢出

编译选项:

【+】 -fsanitize=fuzzer: 代码覆盖率

【+】 -fsanitize=address:启用 AddressSanitizer

【+】 -g:详细调试信息

运行选项:

【+】 -seed:制定随机数

【+】 -max_len:指定 Data 最大长度

【+】 +dir: 指定 corpus

第二个

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
constexpr auto kMagicHeader = "ZN_2016";
constexpr std::size_t kMaxPacketLen = 1024;
constexpr std::size_t kMaxBodyLength = 1024 - sizeof(kMagicHeader);

bool VulnerableFunction2(const uint8_t* data, size_t size, bool verify_hash) {
if (size < sizeof(kMagicHeader))
return false;

std::string header(reinterpret_cast<const char*>(data), sizeof(kMagicHeader));

std::array<uint8_t, kMaxBodyLength> body;

if (strcmp(kMagicHeader, header.c_str()))
return false;

auto target_hash = data[--size];

if (size > kMaxPacketLen)
return false;

if (!verify_hash)
return true;

std::copy(data, data + size, body.data());
auto real_hash = DummyHash(body);
return real_hash == target_hash;
}

fuzzer_code:

1
2
3
4
5
6
7
#include "vulnerable_functions.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
VulnerableFunction2(data, size, true);
VulnerableFunction2(data, size, false);
return 0;
}

这里有个tip:bool类型的变量太好遍历了,为了覆盖率测试两次就好:)

但是会如何影响覆盖率呢?是不是覆盖的呢?

试一试把原函数中return true去掉,fuzzer.cc换成:

1
2
3
4
5
6
#include "vulnerable_functions.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
VulnerableFunction2(data, size, false);
return 0;
}

覆盖率25,上一个覆盖率也是25(均crash)

再去掉

1
if (!verify_hash)

覆盖率降为24

wow很清晰:覆盖率就是整个fuzzer一趟测试触及的 basic-block 总个数。

第三个

加了一个& :

1
2
3
4
5
6
constexpr std::size_t kZn2016VerifyHashFlag = 0x0001000;

bool VulnerableFunction3(const uint8_t* data, size_t size, std::size_t flags) {
bool verify_hash = flags & kZn2016VerifyHashFlag;
return VulnerableFunction2(data, size, verify_hash);
}

那么还是遍历一下:

1
2
3
4
5
6
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {

VulnerableFunction3(data, size, 0x00);
VulnerableFunction3(data, size, 0x1001);
return 0;
}

crash:)

第四个

写到这里想到,写fuzz的目的就是crashcrashcrash,所以尽可能调整fuzz代码达到crash即可,没必要局限于格式。

开始第四个CVE-2014-0160:

build

1
2
3
4
5
6
tar xzf openssl1.0.1f.tgz
cd openssl1.0.1f/

./config
make clean
make CC="clang -O2 -fno-omit-frame-pointer -g -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp,trace-gep,trace-div" -j$(nproc)

fuzzer.cc:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Copyright 2016 Google Inc. All Rights Reserved.
// Licensed under the Apache License, Version 2.0 (the "License");
#include <openssl/ssl.h>
#include <openssl/err.h>
#include <assert.h>
#include <stdint.h>
#include <stddef.h>

#ifndef CERT_PATH
# define CERT_PATH
#endif

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
SSL_library_init();
SSL_load_error_strings();
ERR_load_BIO_strings();
OpenSSL_add_all_algorithms();
SSL_CTX *sctx;
assert (sctx = SSL_CTX_new(TLSv1_method()));
assert(SSL_CTX_use_certificate_file(sctx, CERT_PATH "server.pem",
SSL_FILETYPE_PEM));
assert(SSL_CTX_use_PrivateKey_file(sctx, CERT_PATH "server.key",
SSL_FILETYPE_PEM));
SSL *server = SSL_new(sctx);
BIO *sinbio = BIO_new(BIO_s_mem());
BIO *soutbio = BIO_new(BIO_s_mem());
SSL_set_bio(server, sinbio, soutbio);
SSL_set_accept_state(server);
BIO_write(sinbio, Data, Size);
SSL_do_handshake(server);
SSL_free(server);
return 0;
}

编译选项:

1
clang++ -g openssl_fuzzer.cc -O2 -fno-omit-frame-pointer -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp,trace-gep,trace-div     -I openssl1.0.1f/include openssl1.0.1f/libssl.a openssl1.0.1f/libcrypto.a     ../../libFuzzer/libFuzzer.a -o openssl_fuzzer

跑出来了好几次 oom 和 leakmem ?

  • 去掉 leak
  • 扩大内存
    1
    ./openssl_fuzzer -detect_leaks=0 -rss_limit_mb=4096

花了一分钟才跑出来crash,why???

如果把初始化api分开来看呢?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Copyright 2016 Google Inc. All Rights Reserved.
// Licensed under the Apache License, Version 2.0 (the "License");
#include <openssl/ssl.h>
#include <openssl/err.h>
#include <assert.h>
#include <stdint.h>
#include <stddef.h>

#ifndef CERT_PATH
# define CERT_PATH
#endif

SSL_CTX *Init() {
SSL_library_init();
SSL_load_error_strings();
ERR_load_BIO_strings();
OpenSSL_add_all_algorithms();
SSL_CTX *sctx;
assert (sctx = SSL_CTX_new(TLSv1_method()));
/* These two file were created with this command:
openssl req -x509 -newkey rsa:512 -keyout server.key \
-out server.pem -days 9999 -nodes -subj /CN=a/
*/
assert(SSL_CTX_use_certificate_file(sctx, CERT_PATH "server.pem",
SSL_FILETYPE_PEM));
assert(SSL_CTX_use_PrivateKey_file(sctx, CERT_PATH "server.key",
SSL_FILETYPE_PEM));
return sctx;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
static SSL_CTX *sctx = Init();
SSL *server = SSL_new(sctx);
BIO *sinbio = BIO_new(BIO_s_mem());
BIO *soutbio = BIO_new(BIO_s_mem());
SSL_set_bio(server, sinbio, soutbio);
SSL_set_accept_state(server);
BIO_write(sinbio, Data, Size);
SSL_do_handshake(server);
SSL_free(server);
return 0;
}

五秒钟???

到底是那点会影响 fuzz 效率呢。。。

llvm PASS

这一段当成是插入的知识,再看 ASAN 源码过程中意识到编写llvm pass 一定会对以后独自编写 fuzzer 框架有用的,因此今天除了接着研究ASAN源码之余要学习一下llvm pass的编写,目标是熟练掌握 ModulePass 以及 FunctionPass。

资料

http://www.voidcn.com/article/p-mgwevrjr-brn.html
http://llvm.org/docs/WritingAnLLVMPass.html#the-modulepass-class
https://zhuanlan.zhihu.com/p/26129264
https://blog.csdn.net/Mr_Megamind/article/details/78896717

环境配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apt-get install git
mkdir LLVM && cd LLVM
git clone https://github.com/llvm-mirror/llvm.git
cd llvm

cd tools
git clone https://github.com/llvm-mirror/clang.git
cd ..
mkdir build && cd build

cd ~
mkdir KLLVM
cd LLVM/llvm/build
cmake -DLLVM_TARGETS_TO_BUILD=host -DCMAKE_INSTALL_PREFIX=~/KLLVM -DCMAKE_BUILD_TYPE=MinSizeRel -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly -DLLVM_INCLUDE_EXAMPLES=OFF -DLLVM_INCLUDE_TESTS=OFF -DCLANG_INCLUDE_TESTS=OFF ..

cmake --build . --target install -- -j3

使用clang

hello.c

1
2
3
4
5
6
#include <stdio.h>

int main() {
printf("hello worldn");
return 0;
}

complier it

1
clang hello.c -o hello

输出llvmbitcode

1
clang -O3 -emit-llvm hello.c -c -o hello.bc

-emit-llvm选项可与-S或-c选项一起使用,以分别为代码生成LLVM .ll或.bc文件。两者都是LLVM Bitcode,区别在于前者是可读的文本,后者是不可读的二进制格式。

使用lli执行.bc

1
lli hello.bc

使用llvm-dis对.bc反汇编

1
llvm-dis < hello.bc

使用llc将.bc生成.s

1
llc hello.bc -o hello.s

常用的编译选项:

  • -c: 只激活预处理,编译,和汇编,也就是他只把程序做成obj文件
  • -S: 只激活预处理和编译,就是指把文件编译成为汇编代码。
  • -O+num:优化等级
  • -emit-llvm:llvmbitcode 可与-c或-S 一同使用,但不能有链接

llvm IR

https://releases.llvm.org/2.6/docs/tutorial/JITTutorial1.html
https://releases.llvm.org/2.6/docs/LangRef.html

这块先挖个坑,过年后填回来。

我来填坑了:

Identifiers

  • 全局变量:@
  • 局部有命名的变量:%+string
  • 局部未命名的变量:%+num
  • 注释:;
  • 如果计算结果未分配给命名值,则会创建未命名的临时值。
  • 未命名的临时数据按顺序编号
High Level Structure
  • Module 是llvm的翻译单元
  • 每个Module包含functions,全局变量以及符号表
  • module可以被llvm-linker操作
  • function和全局变量都可以被看作global value
指令
  • ret:

    1
    ret <type> <value>
    1
    2
    3
    ret i32 5                       
    ret void
    ret { i32, i8 } { i32 4, i8 2 } ; Return a struct of values 4 and 2
  • br:

    1
    br i1 <cond>, label <iftrue>, label <iffalse>
    1
    2
    3
    4
    5
    6
    7
    Test:
    %cond = icmp eq i32 %a, %b
    br i1 %cond, label %IfEqual, label %IfUnequal
    IfEqual:
    ret i32 1
    IfUnequal:
    ret i32 0
  • switch:

    1
    switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ]
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    ; Emulate a conditional br instruction
    %Val = zext i1 %value to i32
    switch i32 %Val, label %truedest [ i32 0, label %falsedest ]

    ; Emulate an unconditional br instruction
    switch i32 0, label %dest [ ]

    ; Implement a jump table:
    switch i32 %val, label %otherwise [ i32 0, label %onzero
    i32 1, label %onone
    i32 2, label %ontwo ]
  • invoke:

    1
    2
    <result> = invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs]
    to label <normal label> unwind label <exception label>
    1
    2
    3
    4
      %retval = invoke i32 @Test(i32 15) to label %Continue
    unwind label %TestCleanup ; {i32}:retval set
    %retval = invoke coldcc i32 %Testfnptr(i32 15) to label %Continue
    unwind label %TestCleanup ; {i32}:retval set

熟悉 LLVM API 使用

code1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
using namespace llvm;

Module* makeLLVMModule() {
// Module Construction
Module* mod = new Module("test", getGlobalContext());
Constant* c = mod->getOrInsertFunction("mul_add",
/*ret type*/ IntegerType::get(32),
/*args*/ IntegerType::get(32),
IntegerType::get(32),
IntegerType::get(32),
/*varargs terminated with null*/ NULL);

Function* mul_add = cast<Function>(c);
mul_add->setCallingConv(CallingConv::C);

Function::arg_iterator args = mul_add->arg_begin();
Value* x = args++;
x->setName("x");
Value* y = args++;
y->setName("y");
Value* z = args++;
z->setName("z");

BasicBlock* block = BasicBlock::Create(getGlobalContext(), "entry", mul_add);
IRBuilder<> builder(block);

Value* tmp = builder.CreateBinOp(Instruction::Mul,
x, y, "tmp");
Value* tmp2 = builder.CreateBinOp(Instruction::Add,
tmp, z, "tmp2");

builder.CreateRet(tmp2);

return mod;
}

int main(int argc, char**argv) {
Module* Mod = makeLLVMModule();

verifyModule(*Mod, PrintMessageAction);

PassManager PM;
PM.add(createPrintModulePass(&outs()));
PM.run(*Mod);

delete Mod;
return 0;
}

code2:

:llvm 有自动的名称唯一性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
Module* makeLLVMModule() {
// constract module
Module* mod = new Module("test", getGlobalContext());

// constract func
Constant* c = mod->getOrInsertFunction("mul_add",
/*ret type*/ IntegerType::get(32),
/*args*/ IntegerType::get(32),
IntegerType::get(32),
/*varargs terminated with null*/ NULL);

// cast this function
Function* gcd = cast<Function>(c);

//set arg
Function::arg_iterator args = gcd->arg_begin();
Value* x = args++;
x->setName("x");
Value* y = args++;
y->setName("y");

//set basic blocks
BasicBlock* entry = BasicBlock::Create(getGlobalContext(), ("entry", gcd);
BasicBlock* ret = BasicBlock::Create(getGlobalContext(), ("return", gcd);
BasicBlock* cond_false = BasicBlock::Create(getGlobalContext(), ("cond_false", gcd);
BasicBlock* cond_true = BasicBlock::Create(getGlobalContext(), ("cond_true", gcd);
BasicBlock* cond_false_2 = BasicBlock::Create(getGlobalContext(), ("cond_false", gcd);

//use IRBuild to fill the <entry> basicblocks
IRBuilder<> builder(entry);

//fill
Value* xEqualsY = builder.CreateICmpEQ(x, y, "tmp");
builder.CreateCondBr(xEqualsY, ret, cond_false);

//use <SetInsertPoint> to retarget the targetBB
builder.SetInsertPoint(ret);

//fill
builder.CreateRet(x);


builder.SetInsertPoint(cond_true);
Value* yMinusX = builder.CreateSub(y, x, "tmp");
std::vector<Value*> args1;
args1.push_back(x);
args1.push_back(yMinusX);
Value* recur_1 = builder.CreateCall(gcd, args1.begin(), args1.end(), "tmp");
builder.CreateRet(recur_1);

builder.SetInsertPoint(cond_false_2);
Value* xMinusY = builder.CreateSub(x, y, "tmp");
std::vector<Value*> args2;
args2.push_back(xMinusY);
args2.push_back(y);
Value* recur_2 = builder.CreateCall(gcd, args2.begin(), args2.end(), "tmp");
builder.CreateRet(recur_2);

return mod;
}


}

int main(int argc, char**argv) {
Module* Mod = makeLLVMModule();

verifyModule(*Mod, PrintMessageAction);

PassManager PM;
PM.add(createPrintModulePass(&outs()));
PM.run(*Mod);

delete Mod;
return 0;
}

环境配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apt-get install git
mkdir LLVM && cd LLVM
git clone https://github.com/llvm-mirror/llvm.git
cd llvm

cd tools
git clone https://github.com/llvm-mirror/clang.git
cd ..
mkdir build && cd build

cd ~
mkdir KLLVM
cd LLVM/llvm/build
cmake -DLLVM_TARGETS_TO_BUILD=host -DCMAKE_INSTALL_PREFIX=~/KLLVM -DCMAKE_BUILD_TYPE=MinSizeRel -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly -DLLVM_INCLUDE_EXAMPLES=OFF -DLLVM_INCLUDE_TESTS=OFF -DCLANG_INCLUDE_TESTS=OFF ..

cmake --build . --target install -- -j3

CMakeList

使用cmake

1
2
3
4
5
6
7
8
<project dir>/
|
CMakeLists.txt
<pass name>/
|
CMakeLists.txt
Pass.cpp
...

/CMakeLists.txt:

1
2
3
4
5
6
find_package(LLVM REQUIRED CONFIG)

add_definitions(${LLVM_DEFINITIONS})
include_directories(${LLVM_INCLUDE_DIRS})

add_subdirectory(<pass name>)

cmake1:

1
2
3
4
5
6
7
8
9
10
11
12
if(NOT DEFINED ENV{LLVM_HOME})
message(FATAL_ERROR "$LLVM_HOME is not defined")
endif()
if(NOT DEFINED ENV{LLVM_DIR})
set(ENV{LLVM_DIR} $ENV{LLVM_HOME}/lib/cmake/llvm)
endif()
find_package(LLVM REQUIRED CONFIG)
add_definitions(${LLVM_DEFINITIONS})
include_directories(${LLVM_INCLUDE_DIRS})
link_directories(${LLVM_LIBRARY_DIRS})

add_subdirectory(P1umer) # Use your pass name here.

cmake2:

1
2
3
4
5
6
7
8
9
10
11
12
13
add_library(P1umerPass MODULE
# List your source files here.
P1umer.cpp
)

# Use C++11 to compile your pass (i.e., supply -std=c++11).
target_compile_features(P1umerPass PRIVATE cxx_range_for cxx_auto_type)

# LLVM is (typically) built with no C++ RTTI. We need to match that;
# otherwise, we'll get linker errors about missing RTTI data.
set_target_properties(P1umerPass PROPERTIES
COMPILE_FLAGS "-fno-rtti"
)

编写一个入门的pass示例

p1umer.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
using namespace llvm;

namespace {
struct P1umerPass : public FunctionPass {
static char ID;
P1umerPass() : FunctionPass(ID) {}

virtual bool doInitialization(Module &) override {

printf("11111111111\n");
return false;

}

virtual bool doFinalization(Module &) {
printf("22222222222\n");
return false;
}

virtual bool runOnFunction(Function &F) {
errs() << "I saw a function called " << F.getName() << "!\n";
return false;
}
};
}

char P1umerPass::ID = 0;

// Automatically enable the pass.
// http://adriansampson.net/blog/clangpass.html


static RegisterPass<P1umerPass> X("P1umer", "Hello P1umer");

use it:

1
2
clang -c -emit-llvm hello.c -o hello.bc
opt -load ./libP1umerPass.so -P1umer hello.bc

output:

1
2
3
11111111111
I saw a function called mul_add!
22222222222

熟悉依照CFG构建代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
using namespace llvm;

Module* makeLLVMModule() {
// Module Construction
Module* mod = new Module("test", getGlobalContext());
Constant* c = mod->getOrInsertFunction("mul_add",
/*ret type*/ IntegerType::get(32),
/*args*/ IntegerType::get(32),
IntegerType::get(32),
IntegerType::get(32),
/*varargs terminated with null*/ NULL);

Function* mul_add = cast<Function>(c);
mul_add->setCallingConv(CallingConv::C);

Function::arg_iterator args = mul_add->arg_begin();
Value* x = args++;
x->setName("x");
Value* y = args++;
y->setName("y");
Value* z = args++;
z->setName("z");

BasicBlock* block = BasicBlock::Create(getGlobalContext(), "entry", mul_add);
IRBuilder<> builder(block);

Value* tmp = builder.CreateBinOp(Instruction::Mul,
x, y, "tmp");
Value* tmp2 = builder.CreateBinOp(Instruction::Add,
tmp, z, "tmp2");

builder.CreateRet(tmp2);

return mod;
}

int main(int argc, char**argv) {
Module* Mod = makeLLVMModule();

verifyModule(*Mod, PrintMessageAction);

PassManager PM;
PM.add(createPrintModulePass(&outs()));
PM.run(*Mod);

delete Mod;
return 0;
}

code2:

:llvm 有自动的名称唯一性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
Module* makeLLVMModule() {
// constract module
Module* mod = new Module("test", getGlobalContext());

// constract func
Constant* c = mod->getOrInsertFunction("mul_add",
/*ret type*/ IntegerType::get(32),
/*args*/ IntegerType::get(32),
IntegerType::get(32),
/*varargs terminated with null*/ NULL);

// cast this function
Function* gcd = cast<Function>(c);

//set arg
Function::arg_iterator args = gcd->arg_begin();
Value* x = args++;
x->setName("x");
Value* y = args++;
y->setName("y");

//set basic blocks
BasicBlock* entry = BasicBlock::Create(getGlobalContext(), ("entry", gcd);
BasicBlock* ret = BasicBlock::Create(getGlobalContext(), ("return", gcd);
BasicBlock* cond_false = BasicBlock::Create(getGlobalContext(), ("cond_false", gcd);
BasicBlock* cond_true = BasicBlock::Create(getGlobalContext(), ("cond_true", gcd);
BasicBlock* cond_false_2 = BasicBlock::Create(getGlobalContext(), ("cond_false", gcd);

//use IRBuild to fill the <entry> basicblocks
IRBuilder<> builder(entry);

//fill
Value* xEqualsY = builder.CreateICmpEQ(x, y, "tmp");
builder.CreateCondBr(xEqualsY, ret, cond_false);

//use <SetInsertPoint> to retarget the targetBB
builder.SetInsertPoint(ret);

//fill
builder.CreateRet(x);


builder.SetInsertPoint(cond_true);
Value* yMinusX = builder.CreateSub(y, x, "tmp");
std::vector<Value*> args1;
args1.push_back(x);
args1.push_back(yMinusX);
Value* recur_1 = builder.CreateCall(gcd, args1.begin(), args1.end(), "tmp");
builder.CreateRet(recur_1);

builder.SetInsertPoint(cond_false_2);
Value* xMinusY = builder.CreateSub(x, y, "tmp");
std::vector<Value*> args2;
args2.push_back(xMinusY);
args2.push_back(y);
Value* recur_2 = builder.CreateCall(gcd, args2.begin(), args2.end(), "tmp");
builder.CreateRet(recur_2);

return mod;
}


}

int main(int argc, char**argv) {
Module* Mod = makeLLVMModule();

verifyModule(*Mod, PrintMessageAction);

PassManager PM;
PM.add(createPrintModulePass(&outs()));
PM.run(*Mod);

delete Mod;
return 0;
}

编写一个稍复杂的pass

官方文档

code1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
using namespace llvm;

namespace {

struct IterInsideBB : public FunctionPass {
static char ID; // Pass identification, replacement for typeid
IterInsideBB() : FunctionPass(ID) {}

bool runOnFunction(Function &F) override {
errs() << "Function name: ";
errs() << F.getName() << '\n';
for(Function::iterator bb = F.begin(), e = F.end(); bb!=e; bb++)
{
errs()<<"BasicBlock name = "<< bb->getName() <<"\n";
errs()<<"BasicBlock size = "<< bb->size() << "\n\n";

for(BasicBlock::iterator i = bb->begin(), i2 = bb->end(); i!=i2; i++)
{
outs()<<" "<< *i <<"\n";
}
}
return false;
}
};
}

char IterInsideBB::ID = 0;
static RegisterPass<IterInsideBB> X("IterInsideBB", "Iterate inside basicblocks inside a Function");

code2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
namespace {

struct UseDef : public FunctionPass {
static char ID; // Pass identification, replacement for typeid
UseDef() : FunctionPass(ID) {}

bool runOnFunction(Function &F) override {

errs() << "Function name: ";
errs() << F.getName() << '\n';
for(Function::iterator bb = F.begin(), e = F.end(); bb!=e; bb++)
{
for(BasicBlock::iterator i = bb->begin(), i2 = bb->end(); i!=i2; i++)
{
Instruction * inst = dyn_cast<Instruction>(i);
if(inst->getOpcode() == Instruction::Add)
{
for(Use &U: inst -> operands())
{
Value * v = U.get();
outs()<< *v <<"\n";
}
}
}
}
return false;
}
};
}

char UseDef::ID = 0;
static RegisterPass<UseDef> X("UseDef", "This is use-def Pass");

v1<-loadint 8
v2<-operation add v1, v2

# Fuzz

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×